Distributions and Hypothesis Testing

Binomial distribution

The binomial distribution is a discrete probability distribution. The conditions for a binomial distribution are:

  • fixed number of trials
  • each trial has two outcome: success or fail
  • each trial is independent
  • each trial has a constant probability of success

A variable distributed binomially with number of trials and probability of success is denoted:

For a given , the probability is given by:

Normal approximation

A binomial distribution , can be approximated by a normal distribution with parameters:

This holds for sufficiently large , such that and .

Binomial hypothesis testing

A hypothesis test is used to determine whether the value of a population parameter, such as the population mean or standard deviation, has changed from an assumed one.

When hypothesis testing the binomial distribution, the parameter being tested is the probability or proportion of success, :

  • State the null () and alternate () hypotheses, defining any parameters.
  • State the distribution that the test statistic should follow.
  • Calculate the probability of observing the given value or more extreme of the test statistic, finding the -value.
  • Compare the -value to the significance level. For two-tailed tests, compare the -value to half the significance level.
  • If the -value is less than the significance level, reject the null hypothesis.
  • State the conclusion in context, making sure to not use overly certain language, e.g. "there is sufficient evidence to suggest..."

Alternatively, the critical region can be used. The critical region is the set of values of the test statistic that should lead to rejecting the null hypothesis, bounded by the critical values. The other values form the acceptance region.

The actual significance level is the probability of incorrectly rejecting the null hypothesis, It is the probability of being in the critical region, and is always to the significance level.

Normal distribution

The normal distribution is a continuous probability distribution that models many real-world situations. A variable distributed normally with mean and variance is denoted:

The following facts about the normal distribution are expected (IS):

  • About two-thirds of values lie in the range .
  • About 95% of values lie in the range .
  • Almost all (99.97%) of values lie in the range .
  • The points of inflection of the normal curve lie at .

Z-score

The Z-score is the number of standard deviations above the mean that a value is.

The Z-score can be used to find unknown means or standard deviations.

Normal hypothesis testing

The sample mean, denoted , or possibly , is a random variable described by a distribution for the mean of observations. The sample mean is distributed differently to the original normal:

The variance is divided by , so the standard deviation is divided by .

When hypothesis testing the normal distribution, the parameter being tested is the population mean, .

  • State the null () and alternate () hypotheses, defining any parameters.
  • State the distribution that the test statistic should follow.
  • Use the distribution to find the probability of an outcome as or more extreme occurring, finding the -value. Compare the -valuje against the significance level.
  • Alternatively, find the critical region for a given significance level and check if the data lies within the critical region.

Correlation coefficient

When testing whether there is correlation between two variables, the parameter being tested is the population correlation coefficient, , by looking at values of the sample correlation coefficient, :

  • State the null () and alternate () hypotheses. These will be in terms of .
  • Look up in the table the critical value for for the given significance level, sample size, and tail type.
  • Compare against the critical value, If is greater, then reject .